Data-driven approach for creating synthetic electronic medical records
نویسندگان
چکیده
BACKGROUND New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. METHODS This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. RESULTS We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. CONCLUSIONS A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.
منابع مشابه
Comparing Medical Comorbidities Between Opioid and Cocaine Users: A Data Mining Approach
Background: Prescription drug monitoring programs (PDMPs) are instrumental in controlling opioid misuse,but opioid users have increasingly shifted to cocaine, creating a different set of medical problems. Whileopioid use results in multiple medical comorbidities, findings of the existing studies reported singlecomorbidities rather...
متن کاملEvaluation of Barriers and Facilitators Affecting the Implementation of Electronic Health Records in Iran
Introduction: Despite the development of information technology in the field of health, the process of creating and using electronic health records is still difficult. Therefore, identifying the implementation barriers of this system contribute to eliminate them and adopt effective implementation strategies. Methods and Materials: The present study is a review article and the research populati...
متن کاملOpen data models for smart health interconnected applications: the example of openEHR
BACKGROUND Smart Health is known as a concept that enhances networking, intelligent data processing and combining patient data with other parameters. Open data models can play an important role in creating a framework for providing interoperable data services that support the development of innovative Smart Health applications profiting from data fusion and sharing. METHODS This article descr...
متن کاملOPAL: a clinician driven point of care observational data management consortium.
A vast amount of important information on the various rheumatic diseases that the rheumatologist treats is available in the medical records derived from the patient consultation. Until recently, it has been difficult to assemble and interpret this data. Moreover, the 'everyday' rheumatologist seeing the 'everyday' patient often does not contribute data to better understanding of 'everyday' clin...
متن کاملArchetypes: Knowledge Models for Future-proof Systems
Most information systems today are built using “single-level” methodologies, in which both informational and knowledge concepts are built into one level of object and data models. In domains characterised by complexity, large numbers of concepts, and/or a high rate of definitional change, systems based on such models are expensive to maintain and usually have to be replaced after a few years. H...
متن کامل